鉴于将语言模型转移到NLP任务的成功,我们询问全BERT模型是否始终是最好的,并且它存在一个简单但有效的方法,可以在没有的最先进的深神经网络中找到获胜的票复杂的计算。我们构建了一系列基于BERT的模型,具有不同的大小,并对8个二进制分类任务进行比较。结果表明,真正存在的较小的子网比完整模型更好。然后我们提供进一步的研究,并提出一种简单的方法在微调之前适当地收缩斜率。一些扩展实验表明,我们的方法可以省略甚至没有准确性损失的时间和存储开销。
translated by 谷歌翻译
联邦元学习(FML)已成为应对当今边缘学习竞技场中的数据限制和异质性挑战的承诺范式。然而,其性能通常受到缓慢的收敛性和相应的低通信效率的限制。此外,由于可用的无线电频谱和物联网设备的能量容量通常不足,因此在在实际无线网络中部署FML时,控制资源分配和能量消耗是至关重要的。为了克服挑战,在本文中,我们严格地分析了每个设备对每轮全球损失减少的贡献,并使用非统一的设备选择方案开发FML算法(称为Nufm)以加速收敛。之后,我们制定了集成NuFM在多通道无线系统中的资源分配问题,共同提高收敛速率并最小化壁钟时间以及能量成本。通过逐步解构原始问题,我们设计了一个联合设备选择和资源分配策略,以解决理论保证问题。此外,我们表明Nufm的计算复杂性可以通过$ O(d ^ 2)$至$ o(d)$(使用模型维度$ d $)通过组合两个一阶近似技术来降低。广泛的仿真结果表明,与现有基线相比,所提出的方法的有效性和优越性。
translated by 谷歌翻译
现有的多策略自适应差分演进(de)通常涉及多种策略的试验,然后奖励更好的效率,具有更多资源。但是,剥削或探索战略的试验可能导致过度剥削或过度探索。为了提高性能,本文提出了一种新的策略适应方法,命名为显式适配方案(EA方案),其分离多种策略并将其按需采用它们。通过将演化过程划分为几个具有相似性选择(SCSS)代和自适应代的选择性候选者来完成的。在SCSS世代,通过利用平衡策略来学习利用和探索需求。为了满足这些需求,在自适应世代,另外两种策略,剥削或探索是自适应的。与其变体和其他适应方法相比,基准函数的实验研究证明了EA方案的有效性。此外,具有最先进的进化算法和基于群体智能的算法的性能比较表明EADE非常有竞争力。
translated by 谷歌翻译
The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale alone. In the process of building BLOOM--the Big Science Large Open-science Open-access Multilingual language model--our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study at the billion-parameter scale comparing different modeling practices and their impact on zero-shot generalization. In addition, we study the impact of various popular pre-training corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to choose the target model size, shape, and training setup. All our models and code are open-sourced at https://huggingface.co/bigscience .
translated by 谷歌翻译
聚类是一项基本的机器学习任务,在文献中已广泛研究。经典聚类方法遵循以下假设:数据通过各种表示的学习技术表示为矢量化形式的特征。随着数据变得越来越复杂和复杂,浅(传统)聚类方法无法再处理高维数据类型。随着深度学习的巨大成功,尤其是深度无监督的学习,在过去的十年中,已经提出了许多具有深层建筑的代表性学习技术。最近,已经提出了深层聚类的概念,即共同优化表示的学习和聚类,因此引起了社区的日益关注。深度学习在聚类中的巨大成功,最基本的机器学习任务之一以及该方向的最新进展的巨大成功所激发。 - 艺术方法。我们总结了深度聚类的基本组成部分,并通过设计深度表示学习和聚类之间的交互方式对现有方法进行了分类。此外,该调查还提供了流行的基准数据集,评估指标和开源实现,以清楚地说明各种实验设置。最后但并非最不重要的一点是,我们讨论了深度聚类的实际应用,并提出了应有的挑战性主题,应将进一步的研究作为未来的方向。
translated by 谷歌翻译
基于深度学习的路面裂缝检测方法通常需要大规模标签,具有详细的裂缝位置信息来学习准确的预测。然而,在实践中,由于路面裂缝的各种视觉模式,裂缝位置很难被手动注释。在本文中,我们提出了一种基于深域适应的裂缝检测网络(DDACDN),其学会利用源域知识来预测目标域中的多类别裂缝位置信息,其中仅是图像级标签可用的。具体地,DDACDN首先通过双分支权重共享骨干网络从源和目标域中提取裂缝特征。并且在实现跨域自适应的努力中,通过从每个域的特征空间聚合三尺度特征来构建中间域,以使来自源域的裂缝特征适应目标域。最后,该网络涉及两个域的知识,并接受识别和本地化路面裂缝的培训。为了便于准确的培训和验证域适应,我们使用两个具有挑战性的路面裂缝数据集CQu-BPDD和RDD2020。此外,我们构建了一个名为CQu-BPMDD的新型大型沥青路面多标签疾病数据集,其中包含38994个高分辨率路面疾病图像,以进一步评估模型的稳健性。广泛的实验表明,DDACDN优于最先进的路面裂纹检测方法,以预测目标结构域的裂缝位置。
translated by 谷歌翻译
最近已被证明大型语言模型在各种任务集中获得合理的零射普通化(Brown等,2020)。它已经假设这是语言模型的隐式多任务学习的结果,在语言模型中的预押(Radford等,2019)。可以通过明确的多任务学习直接引起零拍常规化?为了以缩放测试这个问题,我们开发一个系统,以便轻松地将任何自然语言任务映射到人类可读的提示表单中。我们转换一组大量的监督数据集,每个数据集都有多个提示,具有不同的措辞。这些提示的数据集允许基准测试模型执行完全看不见的任务的能力。我们介绍了一个普拉克尔编码器 - 解码器模型(Raffel等,2020; Lester等,2021),覆盖各种任务。该模型在多个标准数据集中达到强大的零点性能,通常优于其尺寸的型号超过16倍。此外,我们的方法对来自Big-替补基准测试的任务子集具有强烈性能,优于其尺寸的6倍。所有提示和培训的型号都可以在https://github.com/ bigscience-workshop / protectsource / httpsource / https://huggingface.co/bigscience/t0pp。
translated by 谷歌翻译
遇到错误的损耗压缩正成为必不可少的技术,即当今科学项目的成功,并在模拟或仪器数据获取过程中产生了大量数据。它不仅可以显着减少数据大小,而且还可以基于用户指定的错误界限控制压缩错误。自动编码器(AE)模型已被广泛用于图像压缩中,但是很少有基于AE的压缩方法支持遇到错误的功能,这是科学应用所要求的。为了解决这个问题,我们使用卷积自动编码器探索以改善科学数据的错误损失压缩,并提供以下三个关键贡献。 (1)我们对各种自动编码器模型的特性进行了深入的研究,并根据SZ模型开发了基于错误的自动编码器的框架。 (2)我们在设计的基于AE的错误压缩框架中优化了主要阶段的压缩质量,并微调块大小和潜在尺寸,并优化了潜在向量的压缩效率。 (3)我们使用五个现实世界的科学数据集评估了我们提出的解决方案,并将其与其他六项相关作品进行了比较。实验表明,我们的解决方案在测试中的所有压缩机中表现出非常具有竞争性的压缩质量。从绝对的角度来看,与SZ2.1和ZFP相比,在高压比的情况下,它可以获得更好的压缩质量(压缩率和相同数据失真的100%〜800%提高)。
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
Diagram object detection is the key basis of practical applications such as textbook question answering. Because the diagram mainly consists of simple lines and color blocks, its visual features are sparser than those of natural images. In addition, diagrams usually express diverse knowledge, in which there are many low-frequency object categories in diagrams. These lead to the fact that traditional data-driven detection model is not suitable for diagrams. In this work, we propose a gestalt-perception transformer model for diagram object detection, which is based on an encoder-decoder architecture. Gestalt perception contains a series of laws to explain human perception, that the human visual system tends to perceive patches in an image that are similar, close or connected without abrupt directional changes as a perceptual whole object. Inspired by these thoughts, we build a gestalt-perception graph in transformer encoder, which is composed of diagram patches as nodes and the relationships between patches as edges. This graph aims to group these patches into objects via laws of similarity, proximity, and smoothness implied in these edges, so that the meaningful objects can be effectively detected. The experimental results demonstrate that the proposed GPTR achieves the best results in the diagram object detection task. Our model also obtains comparable results over the competitors in natural image object detection.
translated by 谷歌翻译